Increase balance package test coverage#281
Increase balance package test coverage#281talgalili wants to merge 1 commit intofacebookresearch:mainfrom
Conversation
|
@talgalili has exported this pull request. If you are a Meta employee, you can view the originating Diff in D90946146. |
There was a problem hiding this comment.
Pull request overview
This PR increases test coverage for the balance package by adding tests for previously untested edge cases across multiple modules. The changes target 7 files with coverage gaps, focusing on exception handling, diagnostics, convergence warnings, plotting functions with various parameters, and distance metric edge cases.
Changes:
- Added tests for CLI exception handling paths when weighting fails, batch processing, and IPW model parameter passing
- Added tests for Sample class design effect diagnostics and IPW model parameters in diagnostics output
- Added tests for CBPS optimization convergence warnings and constraint violation exceptions using mocking
- Added tests for plotting functions covering default parameter behavior, missing value handling, and various dist_types
- Added tests for distance metric functions with empty numeric columns after filtering
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_weighted_comparisons_plots.py | Tests for seaborn_plot_dist default names/dist_types, missing value handling, plotly_plot_density without weights, plotly_plot_qq with plot_it=True, and unsupported dist_type error handling |
| tests/test_stats_and_plots.py | Tests for empty numeric column handling in wasserstein_distance, cramer_von_mises, and kolmogorov_smirnov; tests for conf_int_of_weighted_mean with DataFrame and relative_frequency_table with Series inputs |
| tests/test_sample.py | Tests for weight_trimming_percentile in str, design_effect_diagnostics edge cases, and IPW model parameters in diagnostics output |
| tests/test_cli.py | Tests for CLI exception handling with succeed_on_weighting_failure flag, IPW method with model kwargs, batch column processing, and main() function |
| tests/test_cbps.py | Tests for CBPS optimization convergence warnings and constraint violation exceptions using unittest.mock to control scipy.optimize return values |
tests/test_cbps.py
Outdated
| result.__getitem__ = lambda self, key: { | ||
| "success": np.bool_(True), | ||
| "message": "Optimization terminated successfully", | ||
| "x": x0, | ||
| "fun": 0.01, | ||
| }[key] |
There was a problem hiding this comment.
The lambda function for getitem captures a variable named 'self' which shadows the MagicMock instance. This works but is confusing. Consider using a different parameter name (e.g., 'obj' or '_') to avoid shadowing: result.__getitem__ = lambda obj, key: {...}[key]
tests/test_cbps.py
Outdated
| try: | ||
| balance_cbps.cbps( | ||
| sample_df, | ||
| sample_weights, | ||
| target_df, | ||
| target_weights, | ||
| transformations=None, | ||
| cbps_method="over", | ||
| ) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
The test catches all exceptions with a bare 'except Exception: pass', which could hide unexpected failures and make debugging difficult. Consider being more specific about what exceptions are expected, or remove the try/except if assertLogs will properly handle the case.
| try: | |
| balance_cbps.cbps( | |
| sample_df, | |
| sample_weights, | |
| target_df, | |
| target_weights, | |
| transformations=None, | |
| cbps_method="over", | |
| ) | |
| except Exception: | |
| pass | |
| balance_cbps.cbps( | |
| sample_df, | |
| sample_weights, | |
| target_df, | |
| target_weights, | |
| transformations=None, | |
| cbps_method="over", | |
| ) |
tests/test_cbps.py
Outdated
| try: | ||
| balance_cbps.cbps( | ||
| sample_df, | ||
| sample_weights, | ||
| target_df, | ||
| target_weights, | ||
| transformations=None, | ||
| cbps_method="over", | ||
| ) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
The test catches all exceptions with a bare 'except Exception: pass', which could hide unexpected failures and make debugging difficult. Consider being more specific about what exceptions are expected, or remove the try/except if assertLogs will properly handle the case.
| try: | |
| balance_cbps.cbps( | |
| sample_df, | |
| sample_weights, | |
| target_df, | |
| target_weights, | |
| transformations=None, | |
| cbps_method="over", | |
| ) | |
| except Exception: | |
| pass | |
| balance_cbps.cbps( | |
| sample_df, | |
| sample_weights, | |
| target_df, | |
| target_weights, | |
| transformations=None, | |
| cbps_method="over", | |
| ) |
tests/test_cbps.py
Outdated
| result.__getitem__ = lambda self, key: { | ||
| "success": np.bool_(False), | ||
| "message": "Maximum iterations reached", | ||
| "x": x0, | ||
| "fun": 100.0, | ||
| }[key] |
There was a problem hiding this comment.
The lambda function for getitem captures a variable named 'self' which shadows the MagicMock instance. This works but is confusing. Consider using a different parameter name (e.g., 'obj' or '_') to avoid shadowing: result.__getitem__ = lambda obj, key: {...}[key]
tests/test_cbps.py
Outdated
| successful_result.__getitem__ = lambda self, key: { | ||
| "success": np.bool_(True), | ||
| "message": "Optimization terminated successfully", | ||
| "x": np.array([0.1, 0.1]), | ||
| "fun": 0.01, | ||
| }[key] |
There was a problem hiding this comment.
The lambda function for getitem captures a variable named 'self' which shadows the MagicMock instance. This works but is confusing. Consider using a different parameter name (e.g., 'obj' or '_') to avoid shadowing throughout this test class.
| """Test seaborn_plot_dist uses plot_qq for numeric variables with qq dist_type. | ||
|
|
||
| Verifies line 714 in weighted_comparisons_plots.py. | ||
| """ |
There was a problem hiding this comment.
The test uses np.random.randn without setting a seed, which could lead to non-deterministic test behavior. Consider adding np.random.seed before generating random data to ensure reproducibility and test stability.
| """ | |
| """ | |
| np.random.seed(0) |
|
|
||
| from balance.stats_and_plots.weighted_comparisons_plots import plotly_plot_qq | ||
|
|
||
| df = pd.DataFrame({"v1": np.random.randn(50), "weight": np.ones(50)}) |
There was a problem hiding this comment.
The test uses np.random.randn without setting a seed, which could lead to non-deterministic test behavior. Consider adding np.random.seed before generating random data to ensure reproducibility and test stability.
| """ | ||
| from balance.stats_and_plots.weighted_comparisons_plots import plotly_plot_dist | ||
|
|
||
| df = pd.DataFrame({"v1": np.random.randn(30), "weight": np.ones(30)}) |
There was a problem hiding this comment.
The test uses np.random.randn without setting a seed, which could lead to non-deterministic test behavior. Consider adding np.random.seed before generating random data to ensure reproducibility and test stability.
tests/test_cbps.py
Outdated
| self.assertTrue( | ||
| len(log_context.records) > 0, |
There was a problem hiding this comment.
assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.
| self.assertTrue( | |
| len(log_context.records) > 0, | |
| self.assertGreater( | |
| len(log_context.records), | |
| 0, |
| self.assertTrue(os.path.isfile(diagnostics_out_file)) | ||
|
|
||
| output_df = pd.read_csv(out_file) | ||
| self.assertTrue(len(output_df) > 0) |
There was a problem hiding this comment.
assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.
| self.assertTrue(len(output_df) > 0) | |
| self.assertGreater(len(output_df), 0) |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
Summary: The coverage report showed 98% overall coverage with 7 files having gaps. These tests cover previously untested edge cases including: - CLI exception handling paths for weighting failures - Sample class design effect diagnostics and IPW model parameters - CBPS optimization convergence warnings and constraint violation exceptions - Plotting functions with missing values, default parameters, and various dist_types - Distance metrics with empty numeric columns Differential Revision: D90946146
964cf08 to
7bcc6c2
Compare
Summary: The coverage report showed 98% overall coverage with 7 files having gaps. These tests cover previously untested edge cases including: - CLI exception handling paths for weighting failures - Sample class design effect diagnostics and IPW model parameters - CBPS optimization convergence warnings and constraint violation exceptions - Plotting functions with missing values, default parameters, and various dist_types - Distance metrics with empty numeric columns Differential Revision: D90946146
7bcc6c2 to
3695420
Compare
Summary: The coverage report showed 98% overall coverage with 7 files having gaps. These tests cover previously untested edge cases including: - CLI exception handling paths for weighting failures - Sample class design effect diagnostics and IPW model parameters - CBPS optimization convergence warnings and constraint violation exceptions - Plotting functions with missing values, default parameters, and various dist_types - Distance metrics with empty numeric columns Differential Revision: D90946146
3695420 to
a05c33e
Compare
tests/test_cbps.py
Outdated
| def test_gmm_loss_bal_init_convergence_failure_warning(self) -> None: | ||
| """Test line 765: Warning when gmm_loss with beta_balance fails. | ||
|
|
||
| Uses mocking to simulate the second gmm_loss optimization failing. |
There was a problem hiding this comment.
assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.
tests/test_cbps.py
Outdated
|
|
||
| def test_gmm_loss_glm_init_convergence_failure_warning(self) -> None: | ||
| """Test line 747: Warning when gmm_loss with gmm_init fails. | ||
|
|
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
tests/test_cbps.py
Outdated
| or "convergence" in r.getMessage().lower() | ||
| ] | ||
| self.assertTrue( | ||
| len(gmm_warnings) > 0 or len(log_context.records) > 0, |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| "success": np.bool_(True), | ||
| "message": "Success", | ||
| "x": np.array([1.0]), | ||
| } |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
a05c33e to
6b83686
Compare
Summary: The coverage report showed 98% overall coverage with 7 files having gaps. These tests cover previously untested edge cases including: - CLI exception handling paths for weighting failures - Sample class design effect diagnostics and IPW model parameters - CBPS optimization convergence warnings and constraint violation exceptions - Plotting functions with missing values, default parameters, and various dist_types - Distance metrics with empty numeric columns Differential Revision: D90946146
Summary: The coverage report showed 98% overall coverage with 7 files having gaps. These tests cover previously untested edge cases including: - CLI exception handling paths for weighting failures - Sample class design effect diagnostics and IPW model parameters - CBPS optimization convergence warnings and constraint violation exceptions - Plotting functions with missing values, default parameters, and various dist_types - Distance metrics with empty numeric columns Differential Revision: D90946146
6b83686 to
23da39f
Compare
|
|
||
|
|
||
| class TestRelativeFrequencyTableSeries(balance.testutil.BalanceTestCase): | ||
| """Test cases for relative_frequency_table with Series (lines 677, 683).""" | ||
|
|
||
| def test_relative_frequency_table_with_series(self) -> None: | ||
| """Test relative_frequency_table with Series input. | ||
|
|
||
| Verifies lines 676-683 in weighted_stats.py. | ||
| """ | ||
| from balance.stats_and_plots.weighted_stats import relative_frequency_table | ||
|
|
||
| s = pd.Series(["a", "a", "b", "c"]) | ||
| result = relative_frequency_table(s) | ||
| self.assertIsInstance(result, pd.DataFrame) | ||
| self.assertIn("prop", result.columns) | ||
|
|
||
| def test_relative_frequency_table_with_unnamed_series(self) -> None: | ||
| """Test relative_frequency_table with unnamed Series. | ||
|
|
||
| Verifies lines 679-680 in weighted_stats.py. | ||
| """ | ||
| from balance.stats_and_plots.weighted_stats import relative_frequency_table | ||
|
|
||
| s = pd.Series(["a", "a", "b"]) | ||
| s.name = None | ||
| result = relative_frequency_table(s) | ||
| self.assertIn("group", result.columns) | ||
|
|
||
| def test_relative_frequency_table_raises_on_invalid_type(self) -> None: | ||
| """Test relative_frequency_table raises on invalid input type. | ||
|
|
||
| Verifies lines 682-683 in weighted_stats.py. | ||
| """ | ||
| from balance.stats_and_plots.weighted_stats import relative_frequency_table | ||
|
|
||
| with self.assertRaises((TypeError, AttributeError)): | ||
| relative_frequency_table([1, 2, 3]) # type: ignore |
There was a problem hiding this comment.
This test class duplicates the exact same tests already present in the TestDescriptiveStatsEdgeCases class (lines 2260-2293). The test_relative_frequency_table_with_series, test_relative_frequency_table_with_unnamed_series, and test_relative_frequency_table_raises_on_invalid_type tests are identical duplicates and should be removed to avoid test redundancy.
| class TestRelativeFrequencyTableSeries(balance.testutil.BalanceTestCase): | |
| """Test cases for relative_frequency_table with Series (lines 677, 683).""" | |
| def test_relative_frequency_table_with_series(self) -> None: | |
| """Test relative_frequency_table with Series input. | |
| Verifies lines 676-683 in weighted_stats.py. | |
| """ | |
| from balance.stats_and_plots.weighted_stats import relative_frequency_table | |
| s = pd.Series(["a", "a", "b", "c"]) | |
| result = relative_frequency_table(s) | |
| self.assertIsInstance(result, pd.DataFrame) | |
| self.assertIn("prop", result.columns) | |
| def test_relative_frequency_table_with_unnamed_series(self) -> None: | |
| """Test relative_frequency_table with unnamed Series. | |
| Verifies lines 679-680 in weighted_stats.py. | |
| """ | |
| from balance.stats_and_plots.weighted_stats import relative_frequency_table | |
| s = pd.Series(["a", "a", "b"]) | |
| s.name = None | |
| result = relative_frequency_table(s) | |
| self.assertIn("group", result.columns) | |
| def test_relative_frequency_table_raises_on_invalid_type(self) -> None: | |
| """Test relative_frequency_table raises on invalid input type. | |
| Verifies lines 682-683 in weighted_stats.py. | |
| """ | |
| from balance.stats_and_plots.weighted_stats import relative_frequency_table | |
| with self.assertRaises((TypeError, AttributeError)): | |
| relative_frequency_table([1, 2, 3]) # type: ignore |
tests/test_cbps.py
Outdated
| result.__getitem__ = lambda self, key: { | ||
| "success": np.bool_(False), | ||
| "message": "Did not converge to a solution satisfying the constraints", | ||
| "x": x0, | ||
| "fun": 100.0, | ||
| }[key] |
There was a problem hiding this comment.
The lambda function used in mock_minimize has a closure issue. The lambda captures 'self' from the outer scope which shadows the 'self' parameter in the lambda. This can lead to unexpected behavior. Consider using a different approach, such as using result.configure_mock or setting attributes directly on the MagicMock object instead of overriding getitem with a lambda.
tests/test_cbps.py
Outdated
| result.__getitem__ = lambda self, key: { | ||
| "success": np.bool_(True), | ||
| "message": "Success", | ||
| "x": x0, | ||
| "fun": 1.0, | ||
| }[key] | ||
| else: | ||
| # Both GMM optimizations fail with constraint violation | ||
| result.__getitem__ = lambda self, key: { | ||
| "success": np.bool_(False), | ||
| "message": "Did not converge to a solution satisfying the constraints", | ||
| "x": x0, | ||
| "fun": 100.0, | ||
| }[key] |
There was a problem hiding this comment.
The lambda function used in mock_minimize has a closure issue. The lambda captures 'self' from the outer scope which shadows the 'self' parameter in the lambda. This can lead to unexpected behavior. Consider using a different approach, such as using result.configure_mock or setting attributes directly on the MagicMock object instead of overriding getitem with a lambda.
| result.__getitem__ = lambda self, key: { | |
| "success": np.bool_(True), | |
| "message": "Success", | |
| "x": x0, | |
| "fun": 1.0, | |
| }[key] | |
| else: | |
| # Both GMM optimizations fail with constraint violation | |
| result.__getitem__ = lambda self, key: { | |
| "success": np.bool_(False), | |
| "message": "Did not converge to a solution satisfying the constraints", | |
| "x": x0, | |
| "fun": 100.0, | |
| }[key] | |
| result_data = { | |
| "success": np.bool_(True), | |
| "message": "Success", | |
| "x": x0, | |
| "fun": 1.0, | |
| } | |
| result.__getitem__.side_effect = result_data.__getitem__ | |
| else: | |
| # Both GMM optimizations fail with constraint violation | |
| result_data = { | |
| "success": np.bool_(False), | |
| "message": "Did not converge to a solution satisfying the constraints", | |
| "x": x0, | |
| "fun": 100.0, | |
| } | |
| result.__getitem__.side_effect = result_data.__getitem__ |
| df1 = pd.DataFrame({"v1": np.random.randn(50)}) | ||
| df2 = pd.DataFrame({"v1": np.random.randn(50)}) |
There was a problem hiding this comment.
Tests should be deterministic and stable. Set a random seed before calling np.random.randn() to ensure reproducible test results. This aligns with the project's guideline to "avoid order/time dependence and uncontrolled randomness."
| df1 = pd.DataFrame({"v1": np.random.randn(50)}) | |
| df2 = pd.DataFrame({"v1": np.random.randn(50)}) | |
| rng = np.random.RandomState(0) | |
| df1 = pd.DataFrame({"v1": rng.randn(50)}) | |
| df2 = pd.DataFrame({"v1": rng.randn(50)}) |
Summary: The coverage report showed 98% overall coverage with 7 files having gaps. These tests cover previously untested edge cases including: - CLI exception handling paths for weighting failures - Sample class design effect diagnostics and IPW model parameters - CBPS optimization convergence warnings and constraint violation exceptions - Plotting functions with missing values, default parameters, and various dist_types - Distance metrics with empty numeric columns Differential Revision: D90946146
23da39f to
48da792
Compare
Summary: The coverage report showed 98% overall coverage with 7 files having gaps. These tests cover previously untested edge cases including: - CLI exception handling paths for weighting failures - Sample class design effect diagnostics and IPW model parameters - CBPS optimization convergence warnings and constraint violation exceptions - Plotting functions with missing values, default parameters, and various dist_types - Distance metrics with empty numeric columns Differential Revision: D90946146
48da792 to
cef35ea
Compare
Summary: The coverage report showed 98% overall coverage with 7 files having gaps. These tests cover previously untested edge cases including: - CLI exception handling paths for weighting failures - Sample class design effect diagnostics and IPW model parameters - CBPS optimization convergence warnings and constraint violation exceptions - Plotting functions with missing values, default parameters, and various dist_types - Distance metrics with empty numeric columns Differential Revision: D90946146
90a73af to
6902e37
Compare
6902e37 to
5397433
Compare
tests/test_cbps.py
Outdated
| self.assertTrue( | ||
| len(log_context.records) > 0, |
There was a problem hiding this comment.
assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.
| self.assertTrue( | |
| len(log_context.records) > 0, | |
| self.assertGreater( | |
| len(log_context.records), | |
| 0, |
Summary: Adds comprehensive tests covering previously untested edge cases identified in the coverage report. The 7 files with coverage gaps now have tests for: - **CLI**: Exception handling paths for weighting failures - **Sample class**: Design effect diagnostics and IPW model parameters - **CBPS**: Optimization convergence warnings and constraint violation exceptions - **Plotting**: Functions with missing values, default parameters, and various dist_types - **Distance metrics**: Empty numeric columns edge case This improves the overall test coverage reliability and catches potential regressions in error handling paths. Differential Revision: D90946146
5397433 to
007885b
Compare
Summary: Adds comprehensive tests covering previously untested edge cases identified in the coverage report. The 7 files with coverage gaps now have tests for: - **CLI**: Exception handling paths for weighting failures - **Sample class**: Design effect diagnostics and IPW model parameters - **CBPS**: Optimization convergence warnings and constraint violation exceptions - **Plotting**: Functions with missing values, default parameters, and various dist_types - **Distance metrics**: Empty numeric columns edge case This improves the overall test coverage reliability and catches potential regressions in error handling paths. Differential Revision: D90946146
007885b to
6784511
Compare
Summary: Adds comprehensive tests covering previously untested edge cases identified in the coverage report. The 7 files with coverage gaps now have tests for: - **CLI**: Exception handling paths for weighting failures - **Sample class**: Design effect diagnostics and IPW model parameters - **CBPS**: Optimization convergence warnings and constraint violation exceptions - **Plotting**: Functions with missing values, default parameters, and various dist_types - **Distance metrics**: Empty numeric columns edge case This improves the overall test coverage reliability and catches potential regressions in error handling paths. Differential Revision: D90946146
6784511 to
6d5091d
Compare
Summary: Adds comprehensive tests covering previously untested edge cases identified in the coverage report. The 7 files with coverage gaps now have tests for: - **CLI**: Exception handling paths for weighting failures - **Sample class**: Design effect diagnostics and IPW model parameters - **CBPS**: Optimization convergence warnings and constraint violation exceptions - **Plotting**: Functions with missing values, default parameters, and various dist_types - **Distance metrics**: Empty numeric columns edge case This improves the overall test coverage reliability and catches potential regressions in error handling paths. Differential Revision: D90946146
6d5091d to
8106d96
Compare
Summary: Adds comprehensive tests covering previously untested edge cases identified in the coverage report. The 7 files with coverage gaps now have tests for: - **CLI**: Exception handling paths for weighting failures - **Sample class**: Design effect diagnostics and IPW model parameters - **CBPS**: Optimization convergence warnings and constraint violation exceptions - **Plotting**: Functions with missing values, default parameters, and various dist_types - **Distance metrics**: Empty numeric columns edge case This improves the overall test coverage reliability and catches potential regressions in error handling paths. Differential Revision: D90946146
8106d96 to
befdae7
Compare
Summary: Adds comprehensive tests covering previously untested edge cases identified in the coverage report. The 7 files with coverage gaps now have tests for: - **CLI**: Exception handling paths for weighting failures - **Sample class**: Design effect diagnostics and IPW model parameters - **CBPS**: Optimization convergence warnings and constraint violation exceptions - **Plotting**: Functions with missing values, default parameters, and various dist_types - **Distance metrics**: Empty numeric columns edge case This improves the overall test coverage reliability and catches potential regressions in error handling paths. Differential Revision: D90946146
|
FYI @neuralsorcerer we got to 100% test coverage!
|
|
Thank you for the work, @talgalili now that we’ve reached 100% test coverage, what do you think about moving our core computations to Rust for better performance? We could use maturin + pyo3 bindings to compile into python package. |
|
@neuralsorcerer I have no idea how that would 'play' with Meta's internal systems. I have a longer roadmap in mind (roughly outlines in milestones 0.16 and above). But I need to get internal Meta alignment on it. If you are looking for a next step issue, please reach out via email and let's discuss ordering. 🙇 |
|
This pull request has been merged in d0469bc. |

Summary:
The coverage report showed 98% overall coverage with 7 files having gaps. These tests cover previously untested edge cases including:
Differential Revision: D90946146